sample_superstore DatasetIn this lab, you will work with the sample_superstore
dataset to practice creating, inspecting, and manipulating dataframes.
Follow the steps below to complete the tasks.
Load the sample_superstore dataset into a dataframe.
You can find this dataset in the datasets package or
download it as a CSV file from online sources.
Save the dataframe as superstore.
Hint: Use the read.csv() or
read_excel() function to import the dataset.
df <- read.csv("dataset/Sample - Superstore.csv")
head(df)
head() function.head(df, 10)
str() function to inspect the structure of the
dataframe. What are the data types of the columns?str(df)
'data.frame': 9994 obs. of 21 variables:
$ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
$ Order.Date : chr "11/8/2016" "11/8/2016" "6/12/2016" "10/11/2015" ...
$ Ship.Date : chr "11/11/2016" "11/11/2016" "6/16/2016" "10/18/2015" ...
$ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
$ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
$ Customer.Name: chr "Claire Gute" "Claire Gute" "Darrin Van Huff" "Sean O'Donnell" ...
$ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
$ Country : chr "United States" "United States" "United States" "United States" ...
$ City : chr "Henderson" "Henderson" "Los Angeles" "Fort Lauderdale" ...
$ State : chr "Kentucky" "Kentucky" "California" "Florida" ...
$ Postal.Code : int 42420 42420 90036 33311 33311 90032 90032 90032 90032 90032 ...
$ Region : chr "South" "South" "West" "South" ...
$ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
$ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
$ Sub.Category : chr "Bookcases" "Chairs" "Labels" "Tables" ...
$ Product.Name : chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
$ Sales : num 262 731.9 14.6 957.6 22.4 ...
$ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
$ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
$ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
summary() function to get a summary of the
dataframe. What insights can you gather from the summary?summary(df)
Row.ID Order.ID Order.Date Ship.Date
Min. : 1 Length:9994 Length:9994 Length:9994
1st Qu.:2499 Class :character Class :character Class :character
Median :4998 Mode :character Mode :character Mode :character
Mean :4998
3rd Qu.:7496
Max. :9994
Ship.Mode Customer.ID Customer.Name Segment
Length:9994 Length:9994 Length:9994 Length:9994
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Country City State Postal.Code
Length:9994 Length:9994 Length:9994 Min. : 1040
Class :character Class :character Class :character 1st Qu.:23223
Mode :character Mode :character Mode :character Median :56431
Mean :55190
3rd Qu.:90008
Max. :99301
Region Product.ID Category Sub.Category
Length:9994 Length:9994 Length:9994 Length:9994
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Product.Name Sales Quantity Discount
Length:9994 Min. : 0.444 Min. : 1.00 Min. :0.0000
Class :character 1st Qu.: 17.280 1st Qu.: 2.00 1st Qu.:0.0000
Mode :character Median : 54.490 Median : 3.00 Median :0.2000
Mean : 229.858 Mean : 3.79 Mean :0.1562
3rd Qu.: 209.940 3rd Qu.: 5.00 3rd Qu.:0.2000
Max. :22638.480 Max. :14.00 Max. :0.8000
Profit
Min. :-6599.978
1st Qu.: 1.729
Median : 8.666
Mean : 28.657
3rd Qu.: 29.364
Max. : 8399.976
Sales column as a vector using the
$ operator.df$Sales
[1] 261.9600 731.9400 14.6200 957.5775 22.3680 48.8600 7.2800 907.1520
[9] 18.5040 114.9000 1706.1840 911.4240 15.5520 407.9760 68.8100 2.5440
[17] 665.8800 55.5000 8.5600 213.4800 22.7200 19.4600 60.3400 71.3720
[25] 1044.6300 11.6480 90.5700 3083.4300 9.6180 124.2000 3.2640 86.3040
[33] 6.8580 15.7600 29.4720 1097.5440 190.9200 113.3280 532.3992 212.0580
[41] 371.1680 147.1680 77.8800 95.6160 45.9800 17.4600 211.9600 45.0000
[49] 21.8000 38.2200 75.1800 6.1600 89.9900 15.2600 1029.9500 208.5600
[57] 32.4000 319.4100 14.5600 30.0000 48.4800 1.6800 13.9800 25.8240
[65] 146.7300 79.7600 213.1150 1113.0240 167.9680 75.8800 4.6160 19.0500
[73] 831.9360 97.0400 72.7840 1.2480 9.7080 27.2400 19.3000 208.1600
[81] 16.7400 14.9000 21.3900 200.9840 230.3760 301.9600 19.9900 6.1600
[89] 158.3680 20.1000 73.5840 6.4800 12.9600 53.3400 32.9600 5.6820
[97] 96.5300 51.3120 77.8800 64.6240 95.9760 1.7880 23.9200 238.8960
[105] 102.3600 36.8820 74.1120 27.9920 3.3040 339.9600 41.9600 75.9600
[113] 27.2400 40.0960 4.7200 23.9760 130.4640 787.5300 157.7940 47.0400
[121] 30.8400 226.5600 115.0200 68.0400 600.5580 617.7000 2.3880 243.9920
[129] 81.4240 238.5600 59.9700 78.3040 21.4560 20.0400 35.4400 11.5200
[137] 4.0200 76.1760 65.8800 43.1200 82.8000 8.8200 10.8600 143.7000
[145] 839.4300 671.9300 93.8880 384.4500 149.9700 1951.8400 171.5500 157.9200
[153] 203.1840 58.3800 105.5200 80.8800 6.6300 457.5680 14.6200 944.9300
[161] 5.9800 54.3840 28.4000 27.6800 9.9360 8159.9520 275.9280 1740.0600
[169] 32.0640 177.9800 143.9760 20.9400 110.9600 340.1440 52.4480 20.1600
[177] 97.2640 396.8020 15.8800 3.2800 24.8160 408.7440 503.9600 149.9500
[185] 29.0000 7.1600 176.8000 37.2240 20.0160 899.1360 71.7600 51.8400
[193] 626.3520 19.9000 14.2800 7.4080 6.0480 46.2600 2.9460 16.0560
[201] 21.7440 218.7500 2.6000 66.2840 35.1680 444.7680 83.9200 131.9800
[209] 15.9200 52.2900 91.9900 20.8000 23.6800 452.4500 62.9820 1188.0000
[217] 89.5840 93.0600 302.3760 5.5840 22.7040 19.7760 72.7040 479.9880
[225] 27.1680 2.2000 622.4500 21.9800 161.5680 389.6960 18.6480 233.8600
[233] 620.6145 5.3280 258.0720 617.9760 10.5600 25.9200 419.6800 11.6880
[241] 31.9840 177.2250 4.0440 7.4080 2001.8600 166.7200 47.8800 1503.2500
[249] 25.9200 321.5680 7.6100 3347.3700 80.5800 361.9200 12.1320 82.3680
[257] 53.9200 647.9040 20.3700 221.5500 17.5200 1.6240 3059.9820 2519.9580
[265] 328.2240 79.9000 14.0160 7.5600 37.2080 57.5760 725.8400 209.9300
[273] 5.2800 10.9200 8.8200 5.9800 11.6480 18.1760 59.7120 24.8400
[281] 2.0800 1114.4000 1038.8400 141.7600 239.8000 31.1040 254.0580 194.5280
[289] 961.4800 19.0960 18.4960 255.9840 86.9700 300.4160 230.3520 218.3520
[297] 78.6000 27.5520 32.4000 1082.4800 56.9100 77.6000 14.2800 219.0750
[305] 26.8000 9.8400 45.4800 289.2000 4.8900 15.1360 466.7680 15.2320
[313] 6.2640 87.5400 178.3840 15.5520 99.1360 135.8820 3991.9800 275.9400
[321] 360.0000 43.5700 7.1600 251.5200 99.9900 15.9920 290.8980 54.2240
[329] 786.7440 100.2400 37.7640 82.8000 20.7240 4.8960 4.7520 959.9840
[337] 14.3680 7.7120 698.3520 4.9600 17.8560 509.9700 30.9920 71.9280
[345] 88.8000 47.9760 7.5600 24.5600 12.9600 6.7900 24.5600 3.0480
[353] 49.1200 4355.1680 388.7040 8.2600 17.0400 34.4000 36.2400 647.8400
[361] 20.7000 20.7000 488.6460 5.5600 47.1200 211.9600 23.2000 7.3600
[369] 104.7900 1043.9200 25.9200 53.4240 8.1600 1023.9360 9.2400 479.0400
[377] 99.1360 1488.4240 8.6520 23.8320 12.1760 50.9600 49.5360 41.9000
[385] 375.4575 83.9760 482.3400 2.9600 2.6240 23.3600 39.9800 246.3840
[393] 1799.9700 12.4620 75.7920 49.9600 12.9600 70.1200 35.9520 2396.2656
[401] 131.1360 57.5840 9.5680 39.0720 35.9100 179.9500 1199.9760 27.1500
[409] 1004.0240 9.6800 28.3500 55.9800 1336.8290 113.5680 139.8600 307.1360
[417] 95.9200 383.8000 5.7800 9.3200 15.2500 196.7520 56.5600 32.7000
[425] 866.4000 28.4000 287.9200 69.9900 6.6720 189.5880 408.7440 291.9600
[433] 4.7680 714.3000 4.8120 247.8000 1007.9790 313.4880 31.8720 207.8460
[441] 12.2200 194.9400 70.9500 91.3600 242.9400 22.0500 2.9100 59.5200
[449] 161.9400 263.8800 30.4800 9.8400 35.1200 284.3640 665.4080 63.8800
[457] 129.5680 747.5580 8.9280 103.9200 899.9100 51.3120 23.5600 1272.6300
[465] 28.4850 185.3760 78.2720 254.7440 205.3328 4.7880 55.4800 340.9200
[473] 222.6660 703.9680 92.5200 62.6500 94.8500 95.7600 40.2000 14.7000
[481] 704.2500 9.0900 5.9600 159.9800 29.6000 514.1650 279.9600 2735.9520
[489] 7.9920 63.9840 70.3680 449.1500 11.0700 93.9800 189.8820 105.4200
[497] 119.6160 255.7600 241.5680 69.3000 22.6200 14.9520 801.5680 2.3760
[505] 32.7920 15.9200 2.7400 8.3400 46.7400 6354.9500 126.3000 38.0400
[513] 7.1520 6.6300 5.8800 2999.9500 51.4500 11.9600 1126.0200 18.3920
[521] 129.5680 14.1120 210.9800 55.1760 66.2600 22.2000 683.9520 45.6960
[529] 36.3360 666.2480 52.5120 190.7200 47.9400 979.9500 22.7500 16.7680
[537] 42.6160 10.7520 152.9400 283.9200 468.9000 380.8640 646.7760 58.1120
[545] 100.7920 66.1120 41.2800 13.3600 250.2720 11.3640 8.7200 1121.5680
[553] 34.5040 10.8240 1295.7800 19.4560 20.7000 1335.6800 32.4000 42.6000
[561] 84.0560 13.0000 13.1280 3.9600 2.6100 374.3760 91.8400 81.0880
[569] 19.4400 451.1520 72.4500 13.9600 33.2640 14.8500 8.8200 160.7200
[577] 19.9200 7.3000 69.7120 8.7920 51.5200 470.3760 105.5840 31.1520
[585] 6.7830 406.3680 70.9800 294.9300 84.7840 20.7360 16.8210 10.3680
[593] 9.3440 31.2000 76.1200 1199.9760 445.9600 327.7600 11.6320 143.9820
[601] 494.3760 5.8400 142.7760 45.6960 7.2180 43.1880 131.9040 3.2820
[609] 21.1680 55.1880 369.5760 15.7120 8.4480 728.9460 119.9400 3.6480
[617] 40.4800 9.9400 107.4240 37.9100 88.0200 8.6900 301.9600 555.2100
[625] 523.4800 161.8200 35.5600 97.1600 15.2400 13.2300 243.3840 119.8000
[633] 300.7680 17.8800 235.9440 392.9400 18.8820 122.3280 1049.2000 15.4240
[641] 18.8400 330.4000 26.2500 132.5200 6.4800 209.3000 31.5600 30.1440
[649] 14.8000 302.3760 316.0000 379.4000 97.8200 103.1200 113.5520 3.3180
[657] 134.2880 701.3720 2.3080 999.4320 724.0800 918.7850 2.7240 459.9500
[665] 10.7400 23.7600 85.0560 381.5760 30.3600 23.9760 108.9250 36.3520
[673] 19.5600 61.4400 38.9000 99.3900 2.6880 27.8160 82.5240 182.9940
[681] 14.3520 64.9600 68.6000 7999.9800 167.4400 479.9700 14.6200 19.4400
[689] 191.9840 104.0100 284.8200 36.8400 166.2400 33.4000 198.2720 47.3600
[697] 200.9840 97.6960 2.6960 18.5880 4.8960 15.0720 209.8800 369.9120
[705] 10.3680 166.8400 15.2160 119.9600 883.9200 46.7200 55.4800 24.4480
[713] 281.3400 307.9800 299.9700 19.9200 9.9400 103.0560 59.8080 73.3200
[721] 146.8200 1652.9400 296.3700 129.9200 45.5840 17.5680 55.9920 182.7200
[729] 400.0320 33.6300 542.6460 6.3000 242.9400 179.9700 99.6960 27.9360
[737] 84.9800 18.7200 49.9800 11.7840 272.7360 3.5400 51.5200 3.5280
[745] 4.6240 55.1680 567.1200 359.3200 11.9920 58.0500 157.7400 56.9800
[753] 2.8800 1199.9760 79.9200 383.4380 24.5600 119.8000 13.1280 22.7200
[761] 58.3200 12.3900 107.9820 11.3600 50.9400 646.7400 5.6400 572.5800
[769] 310.8800 641.9600 18.2800 207.0000 32.3500 7.7100 40.3000 34.5800
[777] 32.7600 544.0080 59.9400 23.9200 4.2800 32.0700 24.0000 35.4900
[785] 47.9840 186.6900 17.4560 348.9280 143.9600 15.4200 43.0400 332.9400
[793] 1363.9600 9.9600 21.7200 20.1600 132.7900 12.9600 21.5600 283.9200
[801] 22.2300 215.9680 355.3200 12.9600 18.2800 43.1760 1983.9680 28.4000
[809] 149.9700 11.5200 1298.5500 213.9200 25.7800 18.2800 1399.9300 51.8400
[817] 5.3440 41.4720 3.1680 1228.4650 31.0860 335.5200 239.9700 9.8200
[825] 67.8000 167.9700 35.0000 37.2400 15.2800 301.9600 180.6600 191.9800
[833] 65.9900 35.2160 23.6960 265.4750 51.1840 9.6640 21.0720 60.4500
[841] 11.5200 186.0480 37.4400 26.9760 11.3600 14.6200 83.7200 287.9400
[849] 48.8960 115.3600 5.1600 38.8800 185.8800 44.4600 242.9400 39.9600
[857] 102.3000 21.3600 7.6100 7.1600 7.3600 23.1000 191.4720 5.2480
[865] 59.1840 2.8900 51.9400 15.9360 44.9100 1141.4700 280.7820 34.4400
[873] 11.3600 106.3440 192.1600 322.5900 29.9900 371.9700 5.8920 68.4720
[881] 1242.9000 30.8400 13.4800 31.4000 17.4600 13.9440 83.7600 37.6600
[889] 34.6800 149.9500 51.3120 4.5400 15.9200 543.9200 155.8200 70.0080
[897] 15.6480 103.6000 46.9600 8.9040 10.4400 18.3360 323.9760 20.0400
[905] 64.9600 12.9600 323.1360 90.9300 52.7760 1199.8000 1928.7800 352.3800
[913] 22.2000 46.9400 143.7300 99.9180 797.9440 8.5680 149.3520 12.9920
[921] 24.5600 85.1400 21.9900 406.6000 841.5680 15.5520 252.0000 46.2000
[929] 28.8400 14.5920 89.8560 13.8720 12.1920 45.0560 29.7180 15.5520
[937] 447.6960 159.9900 12.9600 134.4800 17.1200 6.0960 1114.2720 32.4000
[945] 540.5700 167.7600 393.1650 516.4880 1007.2320 2065.3200 15.5520 25.3440
[953] 25.4720 27.1680 78.8528 173.8000 29.5920 4.7520 15.5520 204.6000
[961] 321.5680 6.2400 21.8800 4.6080 9.8200 35.9700 12.9600 191.6000
[969] 8.6400 501.8100 127.1040 124.2000 18.5880 30.0720 160.9300 75.7920
[977] 1.0800 3059.9820 3.2820 34.0200 599.2920 3.3920 559.9840 603.9200
[985] 7.9680 27.9680 336.5100 1.1120 520.0500 17.9700 1166.9200 14.6240
[993] 10.2300 154.9000 2715.9300 617.9700 10.6700 36.6300 24.1000 33.1100
[ reached getOption("max.print") -- omitted 8994 entries ]
Order ID,
Customer Name, and Sales.df_subset <- df[1:15, c("Order.ID", "Customer.Name", "Sales")]
print(df_subset)
nrow() and ncol() functions to
determine the number of rows and columns in the dataframe.nrow(df)
[1] 9994
ncol(df)
[1] 21
Profit
is greater than 100.library(dplyr)
filtered_profit <- df %>% filter(Profit > 100)
print(filtered_profit)
df[df$Profit > 100,]
Category is "Furniture" and the
Sales are greater than 500.filtered_furntiture_sales <- df %>% filter(Category == "Furniture" & Sales > 500)
print(filtered_furntiture_sales)
df[df$Category == "Furniture" & df$Sales > 500, ]
Region
is "West" and the Quantity is greater than
5.filtered_west_quantity <- df %>% filter(Region == "West" & Quantity > 5)
print(filtered_west_quantity)
df[df$Region == "West" & df$Quantity > 5, ]
Profit Margin that calculates
the profit margin as (Profit / Sales) * 100.df <- transform(df,
'Profit.Margin' = (Profit/Sales) * 100
)
head(df)
Sales column by rounding the values to 2
decimal places.df$Sales <- round(df$Sales, 2)
head(df)
Postal Code column from the dataframe using
the subset() or select() function.df <- select(df, -Postal.Code)
head(df)
is.na() function. Are there any missing values?sum(is.na(df))
[1] 0
na.omit() function.df <- na.omit(df)
Sales column with the
mean of the Sales column using the na.fill()
function.library(zoo)
df$Sales <- na.fill(df$Sales, fill = mean(df$Sales))
Region and calculate the total
Sales and Profit for each region.df_grouped <- df %>%
group_by(Region) %>%
summarise(
Total.Sales=sum(Sales),
Total.Profit=sum(Profit)
)
print(df_grouped)
Discount Level that
categorizes the Discount column into:
"Low" (0-0.2)"Medium" (0.2-0.5)"High" (0.5-1)df <- transform (df,
Discount.Level= case_when(
Discount >=0 & Discount <=0.2 ~ "Low",
Discount >=0.2 & Discount <=0.5 ~ "Medium",
Discount >=0.5 & Discount <=1 ~ "High",
TRUE ~ "Unknown" #In case of any other values outside the range
))
head(df)
Sales in descending order.df_sorted <- df %>%
arrange(desc(Sales))
head(df_sorted)